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We present two new constructions for the Toffoli gate which substantially reduce resource costs 
in fault-tolerant quantum computing. The first contribution is a Toffoli gate requiring Clifford 
operations plus only four T = exp{i-Ka^ /8) gates, whereas conventional circuits require seven T gates. 
An extension of this result is that adding n control inputs to a controlled gate requires 4n T gates, 
whereas the best prior result was 8n. The second contribution is a quantum circuit for the Toffoli 
gate which can detect a single error occurring with probability p in any one of eight T gates 
required to produce the Toffoli. By post-selecting circuits that did not detect an error, the posterior 
error probability is suppressed to lowest order from 4p (or 7p, without the first contribution) to 28p^ 
for this enhanced construction. In fault-tolerant quantum computing, this construction can reduce 
the overhead for producing logical Toffoli gates by an order of magnitude. 



I. INTRODUCTION 

Fault-tolerant quantum computing is the effort to de- 
sign quantum information processors which are resilient 
to sufficiently small (but nonzero) probability of failure 
in any individual component [U [5] . Enhanced reliability 
comes at the cost of redundancy, and recent study in this 
area has focused on minimizing the overhead, or addi- 
tional resource costs, associated with converting a perfect 
quantum operation into a form compatible with error cor- 
rection |3H5]. This work focuses on the Toffoli gate, which 
appears in both reversible-classical and quantum logic 
and which may be defined as Tof \x,y,z) = \x,y, z (B xy) , 
for x,y,z being binary variables. Unlike many quan- 
tum gates, the quantum Toffoli gate has a classical ana- 
logue, so it is favored as a building block for importing 
more complex classical operations, such as binary arith- 
metic, into quantum algorithms like Shor's factoring al- 
gorithm m El [7] and quantum simulation [H [HI E] • For 
these reasons, the Toffoli gate is critically important to 
quantum computing in general, and improvements in the 
design of the Toffoli gate make the realization of large- 
scale quantum computation more tractable. 

Several researchers have studied circuit constructions 
for the Toffoli gate. The most oft-cited implementation 
is probably the one on page 182 of Ref. 0, which may 
have been derived from Ref. [10]. As can be seen in 
Ref. [5] , the Toffoli gate is decomposed into smaller quan- 
tum gates, each of which can be made fault-tolerant by 
conventional means [T]. The most nettlesome of these is 
the T = exp(i7r(T^/8) gate, which is much more expensive 
in both time and space resources to produce [31 SI [TTHTB] ; 
notably, the Toffoli circuit in Ref. [2] uses seven T gates. 
In fact, Ref. [TU] contains a construction nearly identi- 
cal to one derived here (we use four T gates), except 
for an undesirable (—1) phase on one output state (we 
show how to correct this with modest effort). However, 
"complete" implementations of the Toffoli gate without 
a phase error have used seven T gates in the literature 
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to date. Amy et al. studied classical search methods for 
decomposing gates like Toffoli into fault-tolerant primi- 
tives [17], and Selinger investigated circuit constructions 
with particular emphasis on T-gate count and depth, 
where the latter metric allows parallel T gates on dif- 
ferent qubits [TS]. We use Selinger's work as our starting 
point, as we turn his almost-Toffoli gate into a proper 
Toffoli gate, using four T gates and some quantum tele- 
portation. Finally, the importance of this topic has at- 
tracted the attention of other researchers, and Eastin has 
independently discovered equivalent results |19j . 

This paper presents two important results. First, Sec- 
tion [n] describes how to implement the Toffoli gate with 
only four T gates and Clifford-group operations [2 HU] • 
Second, Section |III| introduces a Toffoli construction re- 
quiring eight T gates that can detect an error in any 
single T gate. This new circuit is an important devel- 
opment for fault-tolerant quantum computing, because 
it relaxes the requirements on high-fidelity T gates that 
are expensive to produce; however, the circuit is prob- 
abilistic, and we discuss its proper usage. Section |IV| 
presents some analysis of the resource costs and error 
rates for these circuits. The paper concludes with a brief 
discussion of the impact these results have on large-scale 
quantum computing. 



II. TOFFOLI USING JUST FOUR T GATES 

In fault-tolerant quantum computing, the most diffi- 
cult quantum gates to produce are non-Clifford gates. 
The Hadamard gate H = (1/V2)(cr"^ + cr^), the phase 
gate S — exp(i7r(T^/4), and the CNOT gate are generators 
for the Clifford group, as any gate in this group can be 
produced by combinations thereof, up to a global phase 
that we ignore. However, at least one gate outside the 
Clifford group is required for universal quantum comput- 
ing. The T gate is often selected because it is the easiest 
to produce; however, as we explain below, "easy" is rel- 
ative, and this gate is still quite expensive in computing 
resources. 

In most quantum codes, including the surface code [21) . 
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FIG. 2. An error-detecting Toffoli gate. The measurement 
is in the basis, and obtaining result |1) indicates an error 
was detected, so the qubits should be discarded. 



FIG. 1. A circuit construction for Toffoli using four T gates, 
(a) The Toffoli* circuit by Selinger 18 that is almost a Toffoli, 
with the difference being the controUed-S'^ operation, (b) Our 
circuit combines Toffoli* with a phase correction and telepor- 
tation to produce an exact Toffoli gate. The measurement is 
in the a' basis, and the double vertical lines indicate that the 
controlled-^ correction is conditioned on the measurement 
result being |1). 

non-Clifford gates are produced using an ancilla state 
that is injected into the circuit [2l[2^. As this ancilla is 
produced in a faulty manner, it must be purified through 
magic state distillation [THIIlj. The handful of rounds 
of state distillation required to reach the ^ 10~^^ error 
rates required for quantum algorithms are considerably 
expensive, such that a single T gate requires ^ 100 x the 
circuit volume (product of qubits and time steps) of a 
CNOT or H gate [4, making its production the dominant 
cost among fault-tolerant gate primitives. This poses an 
issue for quantum computing, as very many T gates in 
the form of Toffoli gates are required for typical quantum 
algorithms like integer factoring or quantum simulation. 
The first Toffoli gate construction we present uses four 
T gates instead of seven, thereby reducing the overhead 
due to state distillation. 

Let us denote the Toffoli* gate as the operation in Fig- 
ure [T^, which requires four T gates and was introduced 
by Selinger pi. Toffoli and Toffoli* differ only by a 
controlled- 5*^ gate between the control qubits x and y. 
Beginning with Toffoli*, we need only an ancilla qubit, a 
phase gate S, and teleportation to implement the exact 
Toffoli gate, as shown in Figure [TJd. We first apply the 
Toffoli* using the same controls the desired Toffoli but 
with an ancilla |0) as target. The erroneous controlled- 
is corrected by a simple S gate applied to the ancilla. 
Afterwards, the CNDT and measurement teleport the dou- 
bly conditional NOT operation encoded in the ancilla to 
the target qubit of the desired Toffoli. The measurement 
result determines whether a corrective gate of controlled- 
Z, which is in the Clifford group, is required to correct 
a (—1) phase resulting from measurement back-action. 
One can readily verify that only four T gates are required 
in this procedure [21 [2^ . Note that the inverse gate 
requires the same ancilla-based teleportation circuit as 
T, so these gates are equivalent in state-distillation cost 



and construction. 

The construction in Figure [1]d can also be used to 
add control-qubit inputs to an existing controUed-G gate, 
where G is any unitary. Replace the CNDT in Figure [T]3 
with controlled-G (targeting however many qubits G acts 
on), and the result is controUed-controlled-G. By iterat- 
ing this procedure, one can add n controls to controUed- 
G using 4n T gates. The best prior result required 8n 
T gates dg. 



III. ERROR-DETECTING TOFFOLI CIRCUIT 

Whereas the previous section reduced the number of 
T gates needed to make a Toffoli, this section addresses 
the resource-cost problem differently by making each 
T gate less expensive. The cost of a T gate scales in- 
versely with the probability p of it having an undetected 
error, with a relationship where circuit volume (qubits x 
gates) is O (poly(log(l/p))). We introduce a new Toffoli 
gate that can detect an error in any one of eight T gates. 
As a result, the effective error probability of the Tof- 
foli gate is 28p^ instead of Ap (we only consider lowest 
non- vanishing order throughout this paper since p <C 1). 
Even though twice as many T gates are needed, they can 
tolerate larger error rates, so they are substantially less 
expensive to produce than would otherwise be necessary. 

The error-detecting Toffoli circuit is rather simple to 
derive. It consists of two Toffoli* gates acting on a target 
qubit which is in a bit-flip code [5], as shown in Fig- 
ure [2j The gate with reversed triangles is the inverse 
operation (Toffoli*)^. Importantly, the controUed-S* and 
controlled- S*^ gates acting on the same qubits x and y are 
inverse operations, so they cancel. A logically equivalent 
decomposition into T gates is shown in Figure [3) this 
circuit is convenient for analyzing how errors propagate. 
We assume that |0) preparation, H , CNDT, and measure- 
ment operations are perfect, because fault-tolerant error 
correction for these processes is economical compared to 
T gates. A single cr^ error in any of the T gates will nec- 
essarily propagate to the syndrome measurement for this 
bit-flip code, as indicated by the red dashed lines. Upon 
such an event, all of the qubits are discarded. Note that 
errors, if present, do not propagate anywhere since 
they commute with the CNDT gates; they have no effect 
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FIG. 3. An error-detecting Toffoli gate. The red dashed lines 
indicate how any single error will propagate to the readout 
qubit. The measurement is in the basis, and obtaining 
result 1 1) indicates an error was detected, so the qubits should 
be discarded. As long as the ancilla qubits are initialized 
perfectly to |0) and the CNDT and H gates have no errors, then 
only (T^ errors in the T gates matter, as errors cannot 
propagate to data qubits. If the probability of a a" error in 
each T gate is i.i.d. Bernoulli(p), then the success probability 
is l—8p and the a posteriori error probability is 28p^, to lowest 
order in p. 



on the Toffoli gate. 

The circuit in Figure [3] must be discarded upon a de- 
tected error event, which happens with probability 8p. 
If this circuit were connected by entanglement to other 
qubits in a quantum algorithm, all qubits must be dis- 
carded, and the algorithm fails. To avoid this scenario, 
one can produce a Toffoli ancilla ^j. If the circuit fails 
because of a detected error, then the qubits are discarded, 
but no far-reaching damage occurs since this faulty cir- 
cuit is not entangled to any data qubits. Conditioned on 
the circuit succeeding, the ancilla is teleported into data 
qubits to enact a Toffoli gate, using only Clifford gates 
and measurement, as shown in Figure [4] Using a repre- 
sentative value for T-gate error as p = 10~* (we consider 
such a scenario in Section IV I , the failure probability for 
preparing the Toffoli ancilla is a modest 8 x 10^®, which 



negligibly increases the number of times such preparation 
circuits must be repeated. 



IV. RESOURCE ANALYSIS 

Comparing resource costs between the naive Toffoli 
gate using seven T gates and our construction using four 
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FIG. 4. Proper use of an error-detecting Toffoli ancilla. All 
measurements are in the basis. The ancilla-production 
circuit in the grey box in upper left is probabilistic. A mea- 
surement result of |1) indicates the circuit failed, in which 
case the qubits are discarded. Because the Toffoli ancilla is 
not coupled to any other part of the quantum computation, 
its production can be repeated until success. The subsequent 
CNQT gates and measurements teleport data qubits through 
the Toffoli gate encoded in the ancilla (c/. p. 488 of Ref. 2 ). 
Clifford-group gates are applied conditional on measurements 
showing outcome jl), as indicated by the parallel double lines. 



is straightforward. The latter requires about the half 
the resources of the former, under our assumption that 
T gates are the dominant cost. There is also a mod- 
est improvement in the Toffoli error rate (7p becomes 
Ap). However, in fault-tolerant quantum computing, this 
result is likely overshadowed by the error-detecting con- 
struction. 

Doubling the number of T gates from four to eight to 
achieve 0{p^) Toffoli error rate is usually the correct deci- 
sion. The reason is that this approach is more economical 
than increasing the accuracy of the T gates through fur- 
ther magic-state distillation (or other fault-tolerant pro- 
cedures). Bravyi and Haah present a conjecture in the 
context of magic-state distillation, stating that to pro- 
duce one magic state with error O(p^) requires at least 
two input states with error p; hence, the resources needed 
to increase T gate accuracy to O(p^) at least doubles, and 
in all practical cases known to this author, the overhead 
factor is larger than two (an example case is considered 
below). Moreover, there is no known protocol which satu- 
rates this bound. Multilevel distillation comes arbitrarily 
close as p 0, but this limiting case is not always rel- 
evant for finite p, and multilevel protocols require large 
and complex circuits |16| . 

Under conditions relevant to quantum computing, the 
error-detecting Toffoli in Figure [3] can reach the low er- 
ror rates required for quantum algorithms with one less 
round of state distillation, leading to as much as an order- 
of-magnitude reduction in the resources required to pro- 
duce a fault-tolerant Toffoli gate. For example, suppose 
that we wish to produce a Toffoli gate with error prob- 
ability below 10~^^. We presume the "raw" T gate an- 
cilla has a CT^-error probability of 10~^. Using the results 
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in Ref. [T3], the simple TofFoli gate would require four 
T gates distilled to p = 10^^^ using a hybrid scheme 
of one round of Bravyi-Kitaev (BK) distillation and two 
rounds of Meier-Eastin-Knill (MEK) distillation, at an 
average total cost of 1744.8 raw states. Conversely, the 
error-detecting TofFoli would require just one round each 
of BK and MEK distillation circuits for each of the eight 
T gates distilled to p — 10~^, for a total average cost 
of 697.6 raw states. The resource savings factor is 2.5 x, 
just in terms of number of undistilled states needed for 
distillation. In practice, the resource savings is ampli- 
fied by another factor of 2x because one less round of 
distillation is needed (fewer gates means smaller circuit 
volume). Additionally, the state-distillation sub-circuits 
for the error-detecting Toffoli gate can use weaker error 
correction (i.e. lower code distance, by about a factor 
of two) than the same preparation circuits for the sim- 
ple TofFoli, which translates to fewer qubits and gates at 
the hardware level ^22j. Relative to the TofFoli circuit 
using seven T gates, there is an additional savings factor 
of 7/4. Therefore, the error-detecting circuit reduces to- 
tal overhead for non-Clifford gates by up to an order of 
magnitude in this representative example. 

It is also noteworthy that if "raw" T gates can be pro- 
duced with error rate p = 10^^, then the error-detecting 
Toffoli has a posterior error probability of approximately 
3 X 10^^. This would enable modest quantum computa- 
tions using about 10^ Toffolis, such as the multiplication 
of two 1000-bit numbers, without the need for resource- 
intensive magic-state distillation. 



V. CONCLUSIONS 

The Toffoli gate is an ubiquitous operation in quantum 
computing, as it plays a key role in many quantum algo- 
rithms. However, quantum computers that realize these 
algorithms are still out of reach. In the meantime, en- 
gineering a system capable of large-scale, fault-tolerant 
quantum computation demands that quantum computer 
architects minimize computing resource costs in terms of 
execution time and machine size. The constructions in 
this paper substantially reduce the circuit volume for the 
fault-tolerant Toffoli gate when one considers how expen- 
sive each non-Clifford gate T is to produce. In the case 
of the error-detecting Toffoli gate, the resource savings is 
an order of magnitude in a representative example with 
T-gate error p = 0.01. The improved fault-tolerant Tof- 
foli gate brings large-scale quantum computing closer to 
realization. 
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